    Generalized Zurek's bound on the cost of an individual classical or quantum computation

    We consider the minimal thermodynamic cost of an individual computation, where a single input xx is mapped into a single output yy. In prior work, Zurek proposed that this cost was given by K(x∣y)K(x\vert y), the conditional Kolmogorov complexity of xx given yy (up to an additive constant which does not depend on xx or yy). However, this result was derived from an informal argument, applied only to deterministic computations, and had an arbitrary dependence on the choice of protocol (via the additive constant). Here we use stochastic thermodynamics to derive a generalized version of Zurek's bound from a rigorous Hamiltonian formulation. Our bound applies to all quantum and classical processes, whether noisy or deterministic, and it explicitly captures the dependence on the protocol. We show that K(x∣y)K(x\vert y) is a minimal cost of mapping xx to yy that must be paid using some combination of heat, noise, and protocol complexity, implying a tradeoff between these three resources. Our result is a kind of ``algorithmic fluctuation theorem'' with implications for the relationship between the Second Law and the Physical Church-Turing thesis

    Dependence of dissipation on the initial distribution over states

    We analyze how the amount of work dissipated by a fixed nonequilibrium process depends on the initial distribution over states. Specifically, we compare the amount of dissipation when the process is used with some specified initial distribution to the minimal amount of dissipation possible for any initial distribution. We show that the difference between those two amounts of dissipation is given by a simple information-theoretic function that depends only on the initial and final state distributions. Crucially, this difference is independent of the details of the process relating those distributions. We then consider how dissipation depends on the initial distribution for a 'computer', i.e., a nonequilibrium process whose dynamics over coarse-grained macrostates implement some desired input-output map. We show that our results still apply when stated in terms of distributions over the computer's coarse-grained macrostates. This can be viewed as a novel thermodynamic cost of computation, reflecting changes in the distribution over inputs rather than the logical dynamics of the computation

    Estimating Mixture Entropy with Pairwise Distances

    Mixture distributions arise in many parametric and non-parametric settings -- for example, in Gaussian mixture models and in non-parametric estimation. It is often necessary to compute the entropy of a mixture, but, in most cases, this quantity has no closed-form expression, making some form of approximation necessary. We propose a family of estimators based on a pairwise distance function between mixture components, and show that this estimator class has many attractive properties. For many distributions of interest, the proposed estimators are efficient to compute, differentiable in the mixture parameters, and become exact when the mixture components are clustered. We prove this family includes lower and upper bounds on the mixture entropy. The Chernoff α\alpha-divergence gives a lower bound when chosen as the distance function, with the Bhattacharyya distance providing the tightest lower bound for components that are symmetric and members of a location family. The Kullback-Leibler divergence gives an upper bound when used as the distance function. We provide closed-form expressions of these bounds for mixtures of Gaussians, and discuss their applications to the estimation of mutual information. We then demonstrate that our bounds are significantly tighter than well-known existing bounds using numeric simulations. This estimator class is very useful in optimization problems involving maximization/minimization of entropy and mutual information, such as MaxEnt and rate distortion problems.Comment: Corrects several errata in published version, in particular in Section V (bounds on mutual information

    A novel approach to multivariate redundancy and synergy

    Consider a situation in which a set of nn "source" random variables X1,…,XnX_{1},\dots,X_{n} have information about some "target" random variable YY. For example, in neuroscience YY might represent the state of an external stimulus and X1,…,XnX_{1},\dots,X_{n} the activity of nn different brain regions. Recent work in information theory has considered how to decompose the information that the sources X1,…,XnX_{1},\dots,X_{n} provide about the target YY into separate terms such as (1) the "redundant information" that is shared among all of sources, (2) the "unique information" that is provided only by a single source, (3) the "synergistic information" that is provided by all sources only when considered jointly, and (4) the "union information" that is provided by at least one source. We propose a novel framework deriving such a decomposition that can be applied to any number of sources. Our measures are motivated in three distinct ways: via a formal analogy to intersection and union operators in set theory, via a decision-theoretic operationalization based on Blackwell's theorem, and via an axiomatic derivation. A key aspect of our approach is that we relax the assumption that measures of redundancy and union information should be related by the inclusion-exclusion principle. We discuss relations to previous proposals as well as possible generalizations

    Caveats for information bottleneck in deterministic scenarios

    Information bottleneck (IB) is a method for extracting information from one random variable XX that is relevant for predicting another random variable YY. To do so, IB identifies an intermediate "bottleneck" variable TT that has low mutual information I(X;T)I(X;T) and high mutual information I(Y;T)I(Y;T). The "IB curve" characterizes the set of bottleneck variables that achieve maximal I(Y;T)I(Y;T) for a given I(X;T)I(X;T), and is typically explored by maximizing the "IB Lagrangian", I(Y;T)−βI(X;T)I(Y;T) - \beta I(X;T). In some cases, YY is a deterministic function of XX, including many classification problems in supervised learning where the output class YY is a deterministic function of the input XX. We demonstrate three caveats when using IB in any situation where YY is a deterministic function of XX: (1) the IB curve cannot be recovered by maximizing the IB Lagrangian for different values of β\beta; (2) there are "uninteresting" trivial solutions at all points of the IB curve; and (3) for multi-layer classifiers that achieve low prediction error, different layers cannot exhibit a strict trade-off between compression and prediction, contrary to a recent proposal. We also show that when YY is a small perturbation away from being a deterministic function of XX, these three caveats arise in an approximate way. To address problem (1), we propose a functional that, unlike the IB Lagrangian, can recover the IB curve in all cases. We demonstrate the three caveats on the MNIST dataset

    Semantic information, autonomous agency, and nonequilibrium statistical physics

    Shannon information theory provides various measures of so-called "syntactic information", which reflect the amount of statistical correlation between systems. In contrast, the concept of "semantic information" refers to those correlations which carry significance or "meaning" for a given system. Semantic information plays an important role in many fields, including biology, cognitive science, and philosophy, and there has been a long-standing interest in formulating a broadly applicable and formal theory of semantic information. In this paper we introduce such a theory. We define semantic information as the syntactic information that a physical system has about its environment which is causally necessary for the system to maintain its own existence. "Causal necessity" is defined in terms of counter-factual interventions which scramble correlations between the system and its environment, while "maintaining existence" is defined in terms of the system's ability to keep itself in a low entropy state. We also use recent results in nonequilibrium statistical physics to analyze semantic information from a thermodynamic point of view. Our framework is grounded in the intrinsic dynamics of a system coupled to an environment, and is applicable to any physical system, living or otherwise. It leads to formal definitions of several concepts that have been intuitively understood to be related to semantic information, including "value of information", "semantic content", and "agency"

    Nonlinear Information Bottleneck

    Information bottleneck (IB) is a technique for extracting information in one random variable XX that is relevant for predicting another random variable YY. IB works by encoding XX in a compressed "bottleneck" random variable MM from which YY can be accurately decoded. However, finding the optimal bottleneck variable involves a difficult optimization problem, which until recently has been considered for only two limited cases: discrete XX and YY with small state spaces, and continuous XX and YY with a Gaussian joint distribution (in which case optimal encoding and decoding maps are linear). We propose a method for performing IB on arbitrarily-distributed discrete and/or continuous XX and YY, while allowing for nonlinear encoding and decoding maps. Our approach relies on a novel non-parametric upper bound for mutual information. We describe how to implement our method using neural networks. We then show that it achieves better performance than the recently-proposed "variational IB" method on several real-world datasets

    Modularity and the spread of perturbations in complex dynamical systems

    We propose a method to decompose dynamical systems based on the idea that modules constrain the spread of perturbations. We find partitions of system variables that maximize 'perturbation modularity', defined as the autocovariance of coarse-grained perturbed trajectories. The measure effectively separates the fast intramodular from the slow intermodular dynamics of perturbation spreading (in this respect, it is a generalization of the 'Markov stability' method of network community detection). Our approach captures variation of modular organization across different system states, time scales, and in response to different kinds of perturbations: aspects of modularity which are all relevant to real-world dynamical systems. It offers a principled alternative to detecting communities in networks of statistical dependencies between system variables (e.g., 'relevance networks' or 'functional networks'). Using coupled logistic maps, we demonstrate that the method uncovers hierarchical modular organization planted in a system's coupling matrix. Additionally, in homogeneously-coupled map lattices, it identifies the presence of self-organized modularity that depends on the initial state, dynamical parameters, and type of perturbations. Our approach offers a powerful tool for exploring the modular organization of complex dynamical systems

    Thermodynamics of computing with circuits

    Digital computers implement computations using circuits, as do many naturally occurring systems (e.g., gene regulatory networks). The topology of any such circuit restricts which variables may be physically coupled during the operation of a circuit. We investigate how such restrictions on the physical coupling affects the thermodynamic costs of running the circuit. To do this we first calculate the minimal additional entropy production that arises when we run a given gate in a circuit. We then build on this calculation, to analyze how the thermodynamic costs of implementing a computation with a full circuit, comprising multiple connected gates, depends on the topology of that circuit. This analysis provides a rich new set of optimization problems that must be addressed by any designer of a circuit, if they wish to minimize thermodynamic costs.Comment: 26 pages (6 of appendices), 5 figure
